Prediction of Lysine Ubiquitylation with Ensemble Classifier and Feature Selection

نویسندگان

  • Xiaowei Zhao
  • Xiangtao Li
  • Zhiqiang Ma
  • Minghao Yin
چکیده

Ubiquitylation is an important process of post-translational modification. Correct identification of protein lysine ubiquitylation sites is of fundamental importance to understand the molecular mechanism of lysine ubiquitylation in biological systems. This paper develops a novel computational method to effectively identify the lysine ubiquitylation sites based on the ensemble approach. In the proposed method, 468 ubiquitylation sites from 323 proteins retrieved from the Swiss-Prot database were encoded into feature vectors by using four kinds of protein sequences information. An effective feature selection method was then applied to extract informative feature subsets. After different feature subsets were obtained by setting different starting points in the search procedure, they were used to train multiple random forests classifiers and then aggregated into a consensus classifier by majority voting. Evaluated by jackknife tests and independent tests respectively, the accuracy of the proposed predictor reached 76.82% for the training dataset and 79.16% for the test dataset, indicating that this predictor is a useful tool to predict lysine ubiquitylation sites. Furthermore, site-specific feature analysis was performed and it was shown that ubiquitylation is intimately correlated with the features of its surrounding sites in addition to features derived from the lysine site itself. The feature selection method is available upon request.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MLIFT: Enhancing Multi-label Classifier with Ensemble Feature Selection

Multi-label classification has gained significant attention during recent years, due to the increasing number of modern applications associated with multi-label data. Despite its short life, different approaches have been presented to solve the task of multi-label classification. LIFT is a multi-label classifier which utilizes a new strategy to multi-label learning by leveraging label-specific ...

متن کامل

سودمندی رگرسیون‌های تجمیعی و روش‌های انتخاب متغیرهای پیش‌بین بهینه در پیش‌بینی بازده سهام

مقاله حاضر به بررسی سودمندی رگرسیون‌های تجمیعی و روش‌های انتخاب متغیرهای پیش‌بین بهینه (شامل روش مبتنی بر همبستگی و ریلیف) برای پیش‌بینی بازده سهام شرکت‌های پذیرفته شده در بورس اوراق بهادار تهران می‌پردازد. به‌منظور ارزیابی عملکرد رگرسیون تجمیعی، معیارهای ارزیابی (شامل میانگین قدرمطلق درصد خطا، مجذور مربع میانگین خطا و ضریب تعیین) مربوط به پیش‌بینی این روش، با رگرسیون خطی و شبکه‌های عصبی مصنوعی...

متن کامل

Fault Detection of Bearings Using a Rule-based Classifier Ensemble and Genetic Algorithm

This paper proposes a reduct construction method based on discernibility matrix simplification. The method works with genetic algorithm. To identify potential problems and prevent complete failure of bearings, a new method based on rule-based classifier ensemble is presented. Genetic algorithm is used for feature reduction. The generated rules of the reducts are used to build the candidate base...

متن کامل

Evaluation of Classifiers in Software Fault-Proneness Prediction

Reliability of software counts on its fault-prone modules. This means that the less software consists of fault-prone units the more we may trust it. Therefore, if we are able to predict the number of fault-prone modules of software, it will be possible to judge the software reliability. In predicting software fault-prone modules, one of the contributing features is software metric by which one ...

متن کامل

Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection

Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 12  شماره 

صفحات  -

تاریخ انتشار 2011